This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
Notes:
If you are using this document in .Rmd format, you
can press the “play” button on the upper right corner of the chunk of
code (i.e., gray section) to run the code and see the results.
If you are using this document in its .html format,
you can copy-paste the code in an R script and see how it
works.
We can use the = sign to assign values to variables but
the <- operator is conventionally preferred.
# "Initializing" a variable
# Variable <- Value
A <- 1
B <- 2
C <- 3
print(A)
## [1] 1
print(B)
## [1] 2
print(C)
## [1] 3
We can perform basic math operations by using the variables instead of the numbers assigned to them:
D <- A + B
D
## [1] 3
We can also overwrite variables:
# We can also overwrite variables
C <- 2 * A * B ^ B
C
## [1] 8
Note how the value of C changes from 3 to 8, subsequently changing the results of succeeding operations:
C + D
## [1] 11
Using variables:
Variables do not have to be numbers.
There are multiple data types in R and in other computer languages. The standard data types we will often encounter are:
# Subcategories: integer & double
E <- 525600
# TRUE or FALSE
Outcome <- TRUE
# Also known as string; written with quotation marks around them
BestProfession <- "Engineering"
We can print the variables to see the values assigned to them.
print(E)
## [1] 525600
print(Outcome)
## [1] TRUE
print(BestProfession)
## [1] "Engineering"
To know the type of the variable, we can use class()
class(E)
## [1] "numeric"
class(Outcome)
## [1] "logical"
class(BestProfession)
## [1] "character"
A factor in R is a data structure used to handle categorical variables. It is especially useful when the variable represents a fixed number of categories (example: male vs female; low-medium-high).
Factors are internally stores as integers with labels. They can also be “ordered” or “unordered”.
# Creating a factor
gender <- factor(c("Male", "Female", "Female", "Male", "Non-binary"))
# Check levels
levels(gender)
## [1] "Female" "Male" "Non-binary"
We can also specify the levels and orders of factors.
# Custom levels and order
level_ordered <- factor(c("Low", "High", "Medium"),
levels = c("Low", "Medium",
"High"),
ordered = TRUE)
level_ordered
## [1] Low High Medium
## Levels: Low < Medium < High
# Check if it's ordered
is.ordered(level_ordered)
## [1] TRUE
# From character to factor
x <- as.factor(c("A", "B", "A"))
x
## [1] A B A
## Levels: A B
class(x)
## [1] "factor"
# From factor to character
x <- as.character(x)
x
## [1] "A" "B" "A"
class(x)
## [1] "character"
# From character to numeric
y <- c("1", "2", "3")
y <- as.numeric(y)
y
## [1] 1 2 3
class(y)
## [1] "numeric"
Use variables in a script to solve for the number of liters of water needed annually by a town.
How much water does the town use per year?
# There are many ways to approach this problem. Here's one example:
# Step 1: Define the variables
Population <- 10000 # people in the town
Population_LPD <- 120 # water consumption per person (liters per day)
Golf_course_LPM <- 1400000 # water consumption of the golf course per month (liters)
Golf_course_no <- 3 # number of golf courses in the town
Days_year <- 365 # number of days in a year
Days_month <- 30 # number of days in a month
# Step 2: Do the computation
# Compute for the water consumption by:
# a. All people
People_use <- Population * Population_LPD * Days_year
# b. Golf course
Golf_use <- Golf_course_no * ((Golf_course_LPM/Days_month) * Days_year)
# c. Total use
Total_use <- People_use + Golf_use
print(Total_use)
## [1] 489100000
Can we add text (characters) to the printed output so that it provides more information?
Using the paste() function allows us to do that.
print(paste("The total water consumption in the town is", Total_use, "liters per year."))
## [1] "The total water consumption in the town is 489100000 liters per year."
We were able to assign single values to variables. But what if we have a number of related values?
Crop1 <- "rice"
Crop2 <- "corn"
Crop3 <- "sugarcane"
Crop4 <- "cassava"
It can be a bit tedious to assign each value to a different variable. Instead, what we can do is to group them:
Crops <- c("rice", "corn", "sugarcane", "cassava")
Crops
## [1] "rice" "corn" "sugarcane" "cassava"
# We can also use the assigned variables to group them together:
Crops2 <- c(Crop1, Crop2, Crop3, Crop4)
Crops2
## [1] "rice" "corn" "sugarcane" "cassava"
A vector is a data structure that holds elements of the same data type.
Note the syntax for a vector: c(Item1, Item2, …) c = concatenate = link things together in a chain or series
# Numeric vector
v1 <- c(1, 2, 3, 4, 5)
# Character vector
v2 <- c("apple", "banana", "grapes", "cherry", "strawberry")
# Logical vector
v3 <- c(TRUE, FALSE, TRUE, FALSE, FALSE)
Other ways to create vectors:
# Sequence of numbers
v_seq <- seq(0, 50, 2) # Sequence of numbers from 0 to 50, by 2s
# Repeating values
v_rep <- rep(5, times = 4)
We can check the type and length of vectors:
length(Crops) # number of elements
## [1] 4
typeof(Crops) # data type
## [1] "character"
is.vector(Crops) # TRUE if it is a vector
## [1] TRUE
We can use VectorName[index#] to isolate the desired
item. (“Indexing”)
# Using the Crops vector:
Crops[1] # Gets the first element
## [1] "rice"
Crops[4] # Gets the fourth element
## [1] "cassava"
# Accessing multiple values in a vector
Crops[1:2]
## [1] "rice" "corn"
# Overwriting an element in a vector using indexing
Crops[3] <- "dragonfruit"
Crops
## [1] "rice" "corn" "dragonfruit" "cassava"
R is vectorized: it is designed to perform operations on entire vectors of data at once instead of doing one element at a time.
x <- c(1, 2, 3)
y <- c(4, 5, 6)
x + y # 5 7 9
## [1] 5 7 9
x * 2 # 2 4 6
## [1] 2 4 6
x > 2 # FALSE FALSE TRUE
## [1] FALSE FALSE TRUE
Vectors contain groups of objects in one dimension (column or row).
Matrices contain groups of objects in two dimensions (a grid).
Arrays contain groups of objects in any number of dimensions (i.e., vectors and matrices are just specific types of an array).
A matrix is a 2D structure where all elements must be of the same data type.
There are many ways to initialize a matrix.
# Creating a matrix (2D array)
# Option 1: using array() (since a matrix is an array)
m1 <- array(data = 1:10, dim = c(5, 2))
m1
## [,1] [,2]
## [1,] 1 6
## [2,] 2 7
## [3,] 3 8
## [4,] 4 9
## [5,] 5 10
# Option 2: using matrix()
m2 <- matrix(data = 1:10, nrow = 5, byrow = FALSE)
m2
## [,1] [,2]
## [1,] 1 6
## [2,] 2 7
## [3,] 3 8
## [4,] 4 9
## [5,] 5 10
We can use class() to see the object’s class (its
behavior or type as seen by users) and/or typeof() to see
the internal storage type that R used for the object.
class(m1)
## [1] "matrix" "array"
typeof(m1)
## [1] "integer"
class(m2)
## [1] "matrix" "array"
typeof(m2)
## [1] "integer"
We can also create a matrix by binding vectors.
# Column-bind
m3 <- cbind(c(1,2), c(3,4))
# Row-bind
m4 <- rbind(c(1,2), c(3,4))
print(m3)
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
print(m4)
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
# Try to spot the difference between doing a cbind versus rbind:
# Create a sample matrix
m5 <- matrix(1:20, nrow = 4)
m5
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 5 9 13 17
## [2,] 2 6 10 14 18
## [3,] 3 7 11 15 19
## [4,] 4 8 12 16 20
# Access elements in the matrix
m5[1, 2] # Row 1, Column 2
## [1] 5
m5[ , 2] # Entire column 2
## [1] 5 6 7 8
m5[4, ] # Entire row 4
## [1] 4 8 12 16 20
# Create matrices
m6 <- matrix(1:4, nrow = 2)
m7 <- matrix(5:8, nrow = 2)
m6
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
m7
## [,1] [,2]
## [1,] 5 7
## [2,] 6 8
m6 + m7 # Element-wise addition
## [,1] [,2]
## [1,] 6 10
## [2,] 8 12
m6 * m7 # Element-wise multiplication
## [,1] [,2]
## [1,] 5 21
## [2,] 12 32
dim(m5) # dimensions (number of rows, number of columns)
## [1] 4 5
nrow(m5) # number of rows
## [1] 4
ncol(m5) # number of columns
## [1] 5
rowSums(m5) # sum of each row
## [1] 45 50 55 60
colSums(m5) # sum of each column
## [1] 10 26 42 58 74
rowMeans(m5) # average of each row
## [1] 9 10 11 12
colMeans(m5) # average of each column
## [1] 2.5 6.5 10.5 14.5 18.5
t(m5) # transpose
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 5 6 7 8
## [3,] 9 10 11 12
## [4,] 13 14 15 16
## [5,] 17 18 19 20
Vectors and matrices require that their elements are of the same data type.
What if we want to combine different data types?
A data frame is a 2-dimensional table-like structure - Each column is a vector (of the same length) - Different columns can have different data types (numeric, character, factor, etc.),
It is the most commonly used structure for data sets in R (like Excel sheets).
df <- data.frame(
crop = c("rice", "corn", "sugarcane", "dragonfruit", "cassava"),
weight_kg = c(100, 250, 80, 550, 150),
days_in_storage = c(10, 15, 8, 9, 5)
)
df
## crop weight_kg days_in_storage
## 1 rice 100 10
## 2 corn 250 15
## 3 sugarcane 80 8
## 4 dragonfruit 550 9
## 5 cassava 150 5
# To view the df in a separate tab:
View(df)
df$crop # use dollar sign then the name of the column
## [1] "rice" "corn" "sugarcane" "dragonfruit" "cassava"
df[1, 2] # by row and column index
## [1] 100
df[ , "weight_kg"] # all rows of column weight_kg
## [1] 100 250 80 550 150
df[1, ] # entire first row
## crop weight_kg days_in_storage
## 1 rice 100 10
# Using subset()
subset(df, weight_kg > 100)
## crop weight_kg days_in_storage
## 2 corn 250 15
## 4 dragonfruit 550 9
## 5 cassava 150 5
# dimensions (number of rows, number of columns)
dim(df)
## [1] 5 3
# structure
str(df)
## 'data.frame': 5 obs. of 3 variables:
## $ crop : chr "rice" "corn" "sugarcane" "dragonfruit" ...
## $ weight_kg : num 100 250 80 550 150
## $ days_in_storage: num 10 15 8 9 5
# summary statistics
summary(df)
## crop weight_kg days_in_storage
## Length:5 Min. : 80 Min. : 5.0
## Class :character 1st Qu.:100 1st Qu.: 8.0
## Mode :character Median :150 Median : 9.0
## Mean :226 Mean : 9.4
## 3rd Qu.:250 3rd Qu.:10.0
## Max. :550 Max. :15.0
# column names
names(df)
## [1] "crop" "weight_kg" "days_in_storage"
# Adding a new column
df$storage_room <- c(1, 1, 2, 3, 4)
df
## crop weight_kg days_in_storage storage_room
## 1 rice 100 10 1
## 2 corn 250 15 1
## 3 sugarcane 80 8 2
## 4 dragonfruit 550 9 3
## 5 cassava 150 5 4
# Renaming a column
# a. Rename a single column by name
names(df)[names(df) == "weight_kg"] <- "weight_tons"
df
## crop weight_tons days_in_storage storage_room
## 1 rice 100 10 1
## 2 corn 250 15 1
## 3 sugarcane 80 8 2
## 4 dragonfruit 550 9 3
## 5 cassava 150 5 4
# b. Rename by column position
names(df)[2] <- "weight_kg"
df
## crop weight_kg days_in_storage storage_room
## 1 rice 100 10 1
## 2 corn 250 15 1
## 3 sugarcane 80 8 2
## 4 dragonfruit 550 9 3
## 5 cassava 150 5 4
# c. Rename multiple columns
names(df) <- c("crop_name", "weight", "number_of_days_stored", "room_no")
df
## crop_name weight number_of_days_stored room_no
## 1 rice 100 10 1
## 2 corn 250 15 1
## 3 sugarcane 80 8 2
## 4 dragonfruit 550 9 3
## 5 cassava 150 5 4
A list in R is a flexible data structure that can hold elements of different types and lengths, including:
They are building blocks of more complex R objects (like models).
# Creating a simple list
list1 <- list(1, "hello", TRUE, c(2, 3, 4))
list1
## [[1]]
## [1] 1
##
## [[2]]
## [1] "hello"
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] 2 3 4
# Creating a named list
list2 <- list(crop = "rice",
weight_kg = c(100, 250, 80),
status_in_storage = c(TRUE, FALSE, TRUE),
room_no = "1a")
list2
## $crop
## [1] "rice"
##
## $weight_kg
## [1] 100 250 80
##
## $status_in_storage
## [1] TRUE FALSE TRUE
##
## $room_no
## [1] "1a"
# 1. Using $ for named elements
list2$weight_kg
## [1] 100 250 80
# 2. Using double brackets [[]]
list2[[2]] # gets the second element
## [1] 100 250 80
list2[["weight_kg"]] # gets the "weight_kg" (which is also the second element)
## [1] 100 250 80
We will focus in working on data frames since it will be the usual data structure of most data sets that we will be using in our work.
R comes with several built-in data frames that are perfect for learning, testing, and practicing data analysis.
These are preloaded with base R or available in standard packages
like datasets.
# How to see all built-in data sets in R
data()
# A separate tab showing all built-in data sets will come out. View all available data.
# Let us use ChickWeight = Weight versus age of chicks on different diets
# STEP 1: Load the data set
data("ChickWeight") # Load the data set
# STEP 2: See the documentation of the data set
?ChickWeight
# A documentation will appear in the Help tab.
# STEP 3: Explore the data
head(ChickWeight) # shows the first few rows
## weight Time Chick Diet
## 1 42 0 1 1
## 2 51 2 1 1
## 3 59 4 1 1
## 4 64 6 1 1
## 5 76 8 1 1
## 6 93 10 1 1
tail(ChickWeight) # shows the last few rows
## weight Time Chick Diet
## 573 155 12 50 4
## 574 175 14 50 4
## 575 205 16 50 4
## 576 234 18 50 4
## 577 264 20 50 4
## 578 264 21 50 4
str(ChickWeight) # shows the structure
## Classes 'nfnGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 578 obs. of 4 variables:
## $ weight: num 42 51 59 64 76 93 106 125 149 171 ...
## $ Time : num 0 2 4 6 8 10 12 14 16 18 ...
## $ Chick : Ord.factor w/ 50 levels "18"<"16"<"15"<..: 15 15 15 15 15 15 15 15 15 15 ...
## $ Diet : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "formula")=Class 'formula' language weight ~ Time | Chick
## .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## - attr(*, "outer")=Class 'formula' language ~Diet
## .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv>
## - attr(*, "labels")=List of 2
## ..$ x: chr "Time"
## ..$ y: chr "Body weight"
## - attr(*, "units")=List of 2
## ..$ x: chr "(days)"
## ..$ y: chr "(gm)"
summary(ChickWeight) # shows the summary statistics
## weight Time Chick Diet
## Min. : 35.0 Min. : 0.00 13 : 12 1:220
## 1st Qu.: 63.0 1st Qu.: 4.00 9 : 12 2:120
## Median :103.0 Median :10.00 20 : 12 3:120
## Mean :121.8 Mean :10.72 10 : 12 4:118
## 3rd Qu.:163.8 3rd Qu.:16.00 17 : 12
## Max. :373.0 Max. :21.00 19 : 12
## (Other):506
Note that summary() can give us the summary statistics
of the data frame but we can also use separate functions to do this, if
needed:
# mean
mean_weight <- mean(ChickWeight$weight)
# median
median_weight <- median(ChickWeight$weight)
# mode
mode_weight <- mode(ChickWeight$weight)
# standard deviation
sd_weight <- sd(ChickWeight$weight)
print(mean_weight)
## [1] 121.8183
print(median_weight)
## [1] 103
print(mode_weight)
## [1] "numeric"
print(sd_weight)
## [1] 71.07196
# We assign the values to variables so they are saved in our environment and we can use them later in other computations.
We can also used the base R plot() function to
see the matrix of scatterplots, also called pairs plot.
# STEP 4: Explore the relationships between the numeric columns in the data frame
plot(ChickWeight)
This is a scatterplot matrix of all numeric columns in the data frame. Each cell shows a scatterplot between two numeric variables.
Note that the plot in row 1, column 2 is just a mirror of row 2, column 1 (i.e., axes flipped).
What to look for in each plot:
1. Linear patterns
2. Curved patterns
3. Clusters
4. Outliers
Why is CSV file preferred over Excel?
We mainly use the read.csv() to bring in .csv files into
R.
The tricky part here is identifying the file path to the data.
# We can use getwd() to help identify our working directory, where the .csv file (preferably) should also be located.
getwd()
## [1] "/Users/amyeldalecero/R/TRAINING/Intro_to_R_training/01_Scripts/01_Rmds"
# Let us try importing the provided CSV file.
# Provide the file path to the data:
data <- read.csv("/Users/amyeldalecero/R/TRAINING/Intro_to_R_training/00_Data/data.csv",
header = TRUE)
# Note that this code will not work for you because your file will have a different file path.
# You will have to revise the line of code above to make it (and the rest of the code below) to work.
We can use read.table() to import text files into R.
Base R cannot read Excel files directly. We will need external
packages like readxl or openxlsx.
What we can do is to save the Excel sheet as a .csv – can be a problem when data is saved in multiple tabs!
Important reminders:
Use absolute paths
C:/Users/yourname/Documents/file.csv
Or relative paths (relative to your working
directory): data/file.csv
Once we have imported the file, the first step is to always explore it.
# Explore 'data'
dim(data)
## [1] 11914 16
str(data)
## 'data.frame': 11914 obs. of 16 variables:
## $ Make : chr "BMW" "BMW" "BMW" "BMW" ...
## $ Model : chr "1 Series M" "1 Series" "1 Series" "1 Series" ...
## $ Year : int 2011 2011 2011 2011 2011 2012 2012 2012 2012 2013 ...
## $ Engine.Fuel.Type : chr "premium unleaded (required)" "premium unleaded (required)" "premium unleaded (required)" "premium unleaded (required)" ...
## $ Engine.HP : int 335 300 300 230 230 230 300 300 230 230 ...
## $ Engine.Cylinders : int 6 6 6 6 6 6 6 6 6 6 ...
## $ Transmission.Type: chr "MANUAL" "MANUAL" "MANUAL" "MANUAL" ...
## $ Driven_Wheels : chr "rear wheel drive" "rear wheel drive" "rear wheel drive" "rear wheel drive" ...
## $ Number.of.Doors : int 2 2 2 2 2 2 2 2 2 2 ...
## $ Market.Category : chr "Factory Tuner,Luxury,High-Performance" "Luxury,Performance" "Luxury,High-Performance" "Luxury,Performance" ...
## $ Vehicle.Size : chr "Compact" "Compact" "Compact" "Compact" ...
## $ Vehicle.Style : chr "Coupe" "Convertible" "Coupe" "Coupe" ...
## $ highway.MPG : int 26 28 28 28 28 28 26 28 28 27 ...
## $ city.mpg : int 19 19 20 18 18 18 17 20 18 18 ...
## $ Popularity : int 3916 3916 3916 3916 3916 3916 3916 3916 3916 3916 ...
## $ MSRP : int 46135 40650 36350 29450 34500 31200 44100 39300 36900 37200 ...
head(data)
## Make Model Year Engine.Fuel.Type Engine.HP Engine.Cylinders
## 1 BMW 1 Series M 2011 premium unleaded (required) 335 6
## 2 BMW 1 Series 2011 premium unleaded (required) 300 6
## 3 BMW 1 Series 2011 premium unleaded (required) 300 6
## 4 BMW 1 Series 2011 premium unleaded (required) 230 6
## 5 BMW 1 Series 2011 premium unleaded (required) 230 6
## 6 BMW 1 Series 2012 premium unleaded (required) 230 6
## Transmission.Type Driven_Wheels Number.of.Doors
## 1 MANUAL rear wheel drive 2
## 2 MANUAL rear wheel drive 2
## 3 MANUAL rear wheel drive 2
## 4 MANUAL rear wheel drive 2
## 5 MANUAL rear wheel drive 2
## 6 MANUAL rear wheel drive 2
## Market.Category Vehicle.Size Vehicle.Style highway.MPG
## 1 Factory Tuner,Luxury,High-Performance Compact Coupe 26
## 2 Luxury,Performance Compact Convertible 28
## 3 Luxury,High-Performance Compact Coupe 28
## 4 Luxury,Performance Compact Coupe 28
## 5 Luxury Compact Convertible 28
## 6 Luxury,Performance Compact Coupe 28
## city.mpg Popularity MSRP
## 1 19 3916 46135
## 2 19 3916 40650
## 3 20 3916 36350
## 4 18 3916 29450
## 5 18 3916 34500
## 6 18 3916 31200
tail(data)
## Make Model Year Engine.Fuel.Type Engine.HP
## 11909 Acura ZDX 2011 premium unleaded (required) 300
## 11910 Acura ZDX 2012 premium unleaded (required) 300
## 11911 Acura ZDX 2012 premium unleaded (required) 300
## 11912 Acura ZDX 2012 premium unleaded (required) 300
## 11913 Acura ZDX 2013 premium unleaded (recommended) 300
## 11914 Lincoln Zephyr 2006 regular unleaded 221
## Engine.Cylinders Transmission.Type Driven_Wheels Number.of.Doors
## 11909 6 AUTOMATIC all wheel drive 4
## 11910 6 AUTOMATIC all wheel drive 4
## 11911 6 AUTOMATIC all wheel drive 4
## 11912 6 AUTOMATIC all wheel drive 4
## 11913 6 AUTOMATIC all wheel drive 4
## 11914 6 AUTOMATIC front wheel drive 4
## Market.Category Vehicle.Size Vehicle.Style highway.MPG
## 11909 Crossover,Hatchback,Luxury Midsize 4dr Hatchback 23
## 11910 Crossover,Hatchback,Luxury Midsize 4dr Hatchback 23
## 11911 Crossover,Hatchback,Luxury Midsize 4dr Hatchback 23
## 11912 Crossover,Hatchback,Luxury Midsize 4dr Hatchback 23
## 11913 Crossover,Hatchback,Luxury Midsize 4dr Hatchback 23
## 11914 Luxury Midsize Sedan 26
## city.mpg Popularity MSRP
## 11909 16 204 50520
## 11910 16 204 46120
## 11911 16 204 56670
## 11912 16 204 50620
## 11913 16 204 50920
## 11914 17 61 28995
summary(data)
## Make Model Year Engine.Fuel.Type
## Length:11914 Length:11914 Min. :1990 Length:11914
## Class :character Class :character 1st Qu.:2007 Class :character
## Mode :character Mode :character Median :2015 Mode :character
## Mean :2010
## 3rd Qu.:2016
## Max. :2017
##
## Engine.HP Engine.Cylinders Transmission.Type Driven_Wheels
## Min. : 55.0 Min. : 0.000 Length:11914 Length:11914
## 1st Qu.: 170.0 1st Qu.: 4.000 Class :character Class :character
## Median : 227.0 Median : 6.000 Mode :character Mode :character
## Mean : 249.4 Mean : 5.629
## 3rd Qu.: 300.0 3rd Qu.: 6.000
## Max. :1001.0 Max. :16.000
## NA's :69 NA's :30
## Number.of.Doors Market.Category Vehicle.Size Vehicle.Style
## Min. :2.000 Length:11914 Length:11914 Length:11914
## 1st Qu.:2.000 Class :character Class :character Class :character
## Median :4.000 Mode :character Mode :character Mode :character
## Mean :3.436
## 3rd Qu.:4.000
## Max. :4.000
## NA's :6
## highway.MPG city.mpg Popularity MSRP
## Min. : 12.00 Min. : 7.00 Min. : 2 Min. : 2000
## 1st Qu.: 22.00 1st Qu.: 16.00 1st Qu.: 549 1st Qu.: 21000
## Median : 26.00 Median : 18.00 Median :1385 Median : 29995
## Mean : 26.64 Mean : 19.73 Mean :1555 Mean : 40595
## 3rd Qu.: 30.00 3rd Qu.: 22.00 3rd Qu.:2009 3rd Qu.: 42231
## Max. :354.00 Max. :137.00 Max. :5657 Max. :2065902
##
plot(data)
# See the column names
names(data)
## [1] "Make" "Model" "Year"
## [4] "Engine.Fuel.Type" "Engine.HP" "Engine.Cylinders"
## [7] "Transmission.Type" "Driven_Wheels" "Number.of.Doors"
## [10] "Market.Category" "Vehicle.Size" "Vehicle.Style"
## [13] "highway.MPG" "city.mpg" "Popularity"
## [16] "MSRP"
# Remove duplicate rows
clean_data <- data[!duplicated(data), ]
# Note the changes in the dimensions
dim(data)
## [1] 11914 16
dim(clean_data)
## [1] 11199 16
# Remove any row with one or more NAs
clean_data_NA <- na.omit(clean_data)
dim(clean_data_NA)
## [1] 11100 16
Useful references: https://r-graph-gallery.com/base-R.html https://www.sthda.com/english/wiki/r-base-graphs
# Horsepower versus price
plot(x = clean_data_NA$Engine.HP,
y = clean_data_NA$MSRP)
# Adding more elements to make the plot look better
plot(x = clean_data_NA$Engine.HP,
y = clean_data_NA$MSRP,
main = "Horsepower vs Minimum Selling Retail Price", # title
xlab = "engine horsepower", # x-axis title
ylab = "minimum selling retail price") # y-axis title
Used to visualize the distribution of a numeric variable showing its median, quartiles, range, and potential outliers.
boxplot(clean_data_NA$MSRP,
ylab = "price")
Used to visualize the distribution of a numeric variable by dividing it into bins (intervals) and counting how many values fall into each bin
hist(clean_data_NA$MSRP,
main = "Histogram of price", # title
xlab = "Value", # x-axis label
ylab = "Frequency", # y-axis label
col = "lightblue", # color
border = "black", # border color
breaks = 5) # number of breaks
## d. Heat map
# Convert the mtcars data set to a matrix
mtcars_matrix <- as.matrix(mtcars)
mtcars_matrix
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
# Create a heatmap
heatmap(mtcars_matrix,
main = "Heatmap of mtcars",
col = heat.colors(256),
scale = "column")
Packages provide additional functions, datasets, and tools that are not included in base R
Installing Tidyverse
tidyverse is a collection of R packages designed for
data science. It includes tools for data manipulation, visualization,
importing, and cleaning.
install.packages("tidyverse")
# Load the library after installing the package so that we can access its functions
library(tidyverse)
## Warning: package 'tidyr' was built under R version 4.2.3
## Warning: package 'readr' was built under R version 4.2.3
## Warning: package 'dplyr' was built under R version 4.2.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Let us use the mtcars data set in R.The
mtcars dataset is a built-in dataset in R that contains
information about 32 different car models from the 1970s.
# Check the dimensions of the data set
dim(mtcars)
## [1] 32 11
# Check the column names
colnames(mtcars)
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
# View the first few rows
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# View the structure
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
# Get the summary statistics
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
# Check for missing values
sum(is.na(mtcars)) # Total NA values
## [1] 0
colSums(is.na(mtcars)) # NA per column
## mpg cyl disp hp drat wt qsec vs am gear carb
## 0 0 0 0 0 0 0 0 0 0 0
# Use glimpse() from dplyr for a quick overview
glimpse(mtcars)
## Rows: 32
## Columns: 11
## $ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
## $ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
## $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
## $ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
## $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
## $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
## $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
## $ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
## $ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
## $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
## $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…
# EXAMPLE 1: Filter cars with mpg greater than 20
mtcars %>%
filter(mpg > 20) %>%
head()
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# %>% = pipe operator = mechanism for chaining operations, allowing the output of one function to be seamlessly passed as the input to the next
# EXAMPLE 2: Summarize average mpg by number of cylinders
mtcars %>%
group_by(cyl) %>%
summarise(avg_mpg = mean(mpg),
avg_hp = mean(hp),
count = n())
## # A tibble: 3 × 4
## cyl avg_mpg avg_hp count
## <dbl> <dbl> <dbl> <int>
## 1 4 26.7 82.6 11
## 2 6 19.7 122. 7
## 3 8 15.1 209. 14
# EXAMPLE 3: Add a simple calculated column
mtcars_new <- mtcars %>%
mutate(power_to_weight = hp / wt)
head(mtcars_new)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
## power_to_weight
## Mazda RX4 41.98473
## Mazda RX4 Wag 38.26087
## Datsun 710 40.08621
## Hornet 4 Drive 34.21462
## Hornet Sportabout 50.87209
## Valiant 30.34682
# EXAMPLE 4: Add a categorical column based on conditions
mtcars_new <- mtcars %>%
mutate(mpg_category = case_when(
mpg >= 25 ~ "High",
mpg >= 15 ~ "Medium",
TRUE ~ "Low"
))
head(mtcars_new)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
## mpg_category
## Mazda RX4 Medium
## Mazda RX4 Wag Medium
## Datsun 710 Medium
## Hornet 4 Drive Medium
## Hornet Sportabout Medium
## Valiant Medium
# Basic syntax of ggplot()
ggplot(data = mtcars, # data set
aes(x = wt, y = mpg)) + # x and y
geom_point() # geometry
# Customization
ggplot(data = mtcars, # data set
aes(x = wt, y = mpg, # x and y
color = factor(cyl))) + # color based on variable
geom_point(size = 3) + # geometry
geom_smooth(method = "lm") + # linear regression line
labs( # labels
title = "MPG vs Weight",
x = "Weight (1000 lbs)",
y = "Miles per gallon",
color = "Cylinders") +
theme_minimal() # theme
## `geom_smooth()` using formula = 'y ~ x'
# Histogram of miles per gallon (mpg)
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(bins = 10,
fill = "steelblue",
color = "white") +
theme_minimal()
# Boxplot of mpg by cylinder
ggplot(mtcars,
aes(x = factor(cyl), y = mpg)) +
geom_boxplot(fill = "orange") +
labs(x = "Cylinders",
y = "Miles Per Gallon",
title = "MPG by Cylinder Count") +
theme_minimal()
# Barplot of car counts by number of gears
ggplot(mtcars,
aes(x = factor(gear))) +
geom_bar(fill = "steelblue") +
labs(title = "Count of Cars by Gear",
x = "Number of Gears",
y = "Count") +
theme_minimal()
# Facet plot: scatterplot faceted by transmission type
ggplot(mtcars,
aes(x = wt, y = mpg)) +
geom_point() +
facet_wrap(~ am, labeller = labeller(am = c("0" = "Automatic", "1" = "Manual"))) +
labs(title = "MPG vs Weight by Transmission Type") +
theme_minimal()
### b.6. Heat map
# Install additional package
install.packages("reshape2")
# Load the library
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
# Reshape the mtcars data for ggplot
# Add car names as a column
mtcars$car <- rownames(mtcars)
mtcars_melt <- melt(mtcars, id.vars = "car")
# Check the data frame
head(mtcars_melt)
## car variable value
## 1 Mazda RX4 mpg 21.0
## 2 Mazda RX4 Wag mpg 21.0
## 3 Datsun 710 mpg 22.8
## 4 Hornet 4 Drive mpg 21.4
## 5 Hornet Sportabout mpg 18.7
## 6 Valiant mpg 18.1
# Plot
ggplot(mtcars_melt, aes(x = variable, y = car, fill = value)) +
geom_tile() +
scale_fill_gradient(low = "white", high = "red") +
theme_minimal() +
labs(title = "Heatmap of mtcars dataset", x = "Variable", y = "Car Model") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# ggplot with multiple aesthetics
# Prepare data: convert cyl and am to factors
mtcars_plot <- mtcars %>%
mutate(
cyl = as.factor(cyl),
am = factor(am, labels = c("Automatic", "Manual"))
)
# Create complex plot
ggplot(mtcars_plot, aes(x = hp, y = mpg, color = cyl, size = wt)) +
geom_point(alpha = 0.8) +
geom_smooth(method = "lm", se = FALSE, linetype = "dashed", color = "black") +
facet_wrap(~ am) +
labs(
title = "Fuel Efficiency vs Horsepower by Transmission Type",
subtitle = "Point size represents vehicle weight; color represents cylinder count",
x = "Horsepower (hp)",
y = "Miles per Gallon (mpg)",
color = "Cylinders",
size = "Weight (1000 lbs)"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(size = 12),
strip.text = element_text(face = "bold")
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: size.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## Warning: The following aesthetics were dropped during statistical transformation: size.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
# Install and load plotly
install.packages("plotly")
# Load the library
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
# Build the plot and store in a variable
mtcars_plot <- mtcars %>%
mutate(
cyl = as.factor(cyl),
am = factor(am, labels = c("Automatic", "Manual"))
)
p <- ggplot(mtcars_plot, aes(x = hp, y = mpg, color = cyl, size = wt,
text = paste("Model:", rownames(mtcars_plot),
"<br>HP:", hp,
"<br>MPG:", mpg,
"<br>Weight:", wt))) +
geom_point(alpha = 0.8) +
geom_smooth(method = "lm", se = FALSE, linetype = "dashed", color = "black") +
facet_wrap(~ am) +
labs(
title = "Fuel Efficiency vs Horsepower by Transmission Type",
subtitle = "Point size represents vehicle weight; color represents cylinder count",
x = "Horsepower (hp)",
y = "Miles per Gallon (mpg)",
color = "Cylinders",
size = "Weight (1000 lbs)"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(size = 12),
strip.text = element_text(face = "bold")
)
mtcars_plot
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 Manual 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 Manual 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 Manual 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 Automatic 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 Automatic 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 Automatic 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 Automatic 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 Automatic 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 Automatic 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 Automatic 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 Automatic 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 Automatic 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 Automatic 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 Automatic 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 Automatic 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 Automatic 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 Automatic 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 Manual 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 Manual 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 Manual 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 Automatic 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 Automatic 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 Automatic 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 Automatic 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 Automatic 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 Manual 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 Manual 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 Manual 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 Manual 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 Manual 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 Manual 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 Manual 4 2
## car
## Mazda RX4 Mazda RX4
## Mazda RX4 Wag Mazda RX4 Wag
## Datsun 710 Datsun 710
## Hornet 4 Drive Hornet 4 Drive
## Hornet Sportabout Hornet Sportabout
## Valiant Valiant
## Duster 360 Duster 360
## Merc 240D Merc 240D
## Merc 230 Merc 230
## Merc 280 Merc 280
## Merc 280C Merc 280C
## Merc 450SE Merc 450SE
## Merc 450SL Merc 450SL
## Merc 450SLC Merc 450SLC
## Cadillac Fleetwood Cadillac Fleetwood
## Lincoln Continental Lincoln Continental
## Chrysler Imperial Chrysler Imperial
## Fiat 128 Fiat 128
## Honda Civic Honda Civic
## Toyota Corolla Toyota Corolla
## Toyota Corona Toyota Corona
## Dodge Challenger Dodge Challenger
## AMC Javelin AMC Javelin
## Camaro Z28 Camaro Z28
## Pontiac Firebird Pontiac Firebird
## Fiat X1-9 Fiat X1-9
## Porsche 914-2 Porsche 914-2
## Lotus Europa Lotus Europa
## Ford Pantera L Ford Pantera L
## Ferrari Dino Ferrari Dino
## Maserati Bora Maserati Bora
## Volvo 142E Volvo 142E
# Convert to interactive plot
ggplotly(p, tooltip = "text")
## `geom_smooth()` using formula = 'y ~ x'